Building Contextual Anchor Text Representation using Graph Regularization
نویسنده
چکیده
Anchor texts are useful complementary description for target pages, widely applied to improve search relevance. The benefits come from the additional information introduced into document representation and the intelligent ways of estimating their relative importance. Previous work on anchor importance estimation treated anchor text independently without considering its context. As a result, the lack of constraints from such context fails to guarantee a stable anchor text representation. We propose an anchor graph regularization approach to incorporate constraints from such context into anchor text weighting process, casting the task into a convex quadratic optimization problem. The constraints draw from the estimation of anchor-anchor, anchorpage, and page-page similarity. Based on any estimators, our approach operates as a post process of refining the estimated anchor weights, making it a plug and play component in search infrastructure. Comparable experiments on standard data sets (TREC 2009 and 2010) demonstrate the efficacy of our approach.
منابع مشابه
UWaterloo at NTCIR-9: Intent discovery with anchor text
This paper describes our submission to the Intent Discovery task at the NTCIR-9. By treating the source and target documents of anchor texts as nodes, we utilized the anchor texts between the nodes as edges in a documents–anchors graph representation of the corpus. We extracted and indexed anchor links information from the provided SogouT corpus. Using the queries, anchor texts are retrieved fr...
متن کاملA Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields
This paper presents a method for categorizing named entities in Wikipedia. In Wikipedia, an anchor text is glossed in a linked HTML text. We formalize named entity categorization as a task of categorizing anchor texts with linked HTML texts which glosses a named entity. Using this representation, we introduce a graph structure in which anchor texts are regarded as nodes. In order to incorporate...
متن کاملVenue Recommendation and Web Search Based on Anchor Text
This paper presents the University of Amsterdam’s participation in TREC 2014. For the Contextual Suggestion Track, we experimented with the use of anchor text representations in the language modeling framework, and base our runs either on full ClueWeb12 or the subset of touristic aggregators (e.g., tripadvisor) provided by the organizers of the track. We also look at the effectiveness of priors...
متن کاملUniversity of Amsterdam at TREC 2014
This paper presents the University of Amsterdam’s participation in TREC 2014. For the Contextual Suggestion Track, we experimented with the use of anchor text representations in the language modeling framework, and base our runs either on full ClueWeb12 or the subset of touristic aggregators (e.g., tripadvisor) provided by the organizers of the track. We also look at the effectiveness of priors...
متن کاملQuery Suggestion Using Anchor Text
Many query suggestion techniques have been proposed to better capture user intent and to improve search effectiveness. However, most of these methods make use of query logs which are not readily available to the research community. Anchor text, on the other hand, is widely available and has proven useful for many tasks. In this paper, we investigate the problem of query suggestion via random wa...
متن کامل